我们构建了一个系统,可以通过自己的手展示动作,使任何人都可以控制机器人手和手臂。机器人通过单个RGB摄像机观察人类操作员,并实时模仿其动作。人的手和机器人的手在形状,大小和关节结构上有所不同,并且从单个未校准的相机进行这种翻译是一个高度不受约束的问题。此外,重新定位的轨迹必须有效地在物理机器人上执行任务,这要求它们在时间上平稳且没有自我收集。我们的关键见解是,虽然配对的人类机器人对应数据的收集价格昂贵,但互联网包含大量丰富而多样的人类手视频的语料库。我们利用这些数据来训练一个理解人手并将人类视频流重新定位的系统训练到机器人手臂轨迹中,该轨迹是平稳,迅速,安全和语义上与指导演示的相似的系统。我们证明,它使以前未经训练的人能够在各种灵巧的操纵任务上进行机器人的态度。我们的低成本,无手套,无标记的远程遥控系统使机器人教学更容易访问,我们希望它可以帮助机器人学习在现实世界中自主行动。视频https://robotic-telekinesis.github.io/
translated by 谷歌翻译
In this paper, we propose Adam-Hash: an adaptive and dynamic multi-resolution hashing data-structure for fast pairwise summation estimation. Given a data-set $X \subset \mathbb{R}^d$, a binary function $f:\mathbb{R}^d\times \mathbb{R}^d\to \mathbb{R}$, and a point $y \in \mathbb{R}^d$, the Pairwise Summation Estimate $\mathrm{PSE}_X(y) := \frac{1}{|X|} \sum_{x \in X} f(x,y)$. For any given data-set $X$, we need to design a data-structure such that given any query point $y \in \mathbb{R}^d$, the data-structure approximately estimates $\mathrm{PSE}_X(y)$ in time that is sub-linear in $|X|$. Prior works on this problem have focused exclusively on the case where the data-set is static, and the queries are independent. In this paper, we design a hashing-based PSE data-structure which works for the more practical \textit{dynamic} setting in which insertions, deletions, and replacements of points are allowed. Moreover, our proposed Adam-Hash is also robust to adaptive PSE queries, where an adversary can choose query $q_j \in \mathbb{R}^d$ depending on the output from previous queries $q_1, q_2, \dots, q_{j-1}$.
translated by 谷歌翻译
Our earlier research built a virtual shake robot in simulation to study the dynamics of precariously balanced rocks (PBR), which are negative indicators of earthquakes in nature. The simulation studies need validation through physical experiments. For this purpose, we developed Shakebot, a low-cost (under $2,000), open-source shake table to validate simulations of PBR dynamics and facilitate other ground motion experiments. The Shakebot is a custom one-dimensional prismatic robotic system with perception and motion software developed using the Robot Operating System (ROS). We adapted affordable and high-accuracy components from 3D printers, particularly a closed-loop stepper motor for actuation and a toothed belt for transmission. The stepper motor enables the bed to reach a maximum horizontal acceleration of 11.8 m/s^2 (1.2 g), and velocity of 0.5 m/s, when loaded with a 2 kg scale-model PBR. The perception system of the Shakebot consists of an accelerometer and a high frame-rate camera. By fusing camera-based displacements with acceleration measurements, the Shakebot is able to carry out accurate bed velocity estimation. The ROS-based perception and motion software simplifies the transition of code from our previous virtual shake robot to the physical Shakebot. The reuse of the control programs ensures that the implemented ground motions are consistent for both the simulation and physical experiments, which is critical to validate our simulation experiments.
translated by 谷歌翻译
In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable performance of different state-of-the-art SLU systems under different circumstances, e.g., automatically- vs. manually-generated transcripts. We evaluate the systems on the publicly available SLURP spoken language resource corpus. Our results indicate that using richer forms of Automatic Speech Recognition (ASR) outputs allows SLU systems to improve in comparison to the 1-best setup (4% relative improvement). However, crossmodal approaches, i.e., learning from acoustic and text embeddings, obtains performance similar to the oracle setup, and a relative improvement of 18% over the 1-best configuration. Thus, crossmodal architectures represent a good alternative to overcome the limitations of working purely automatically generated textual data.
translated by 谷歌翻译
We revisit a simple Learning-from-Scratch baseline for visuo-motor control that uses data augmentation and a shallow ConvNet. We find that this baseline has competitive performance with recent methods that leverage frozen visual representations trained on large-scale vision datasets.
translated by 谷歌翻译
Developing robots that are capable of many skills and generalization to unseen scenarios requires progress on two fronts: efficient collection of large and diverse datasets, and training of high-capacity policies on the collected data. While large datasets have propelled progress in other fields like computer vision and natural language processing, collecting data of comparable scale is particularly challenging for physical systems like robotics. In this work, we propose a framework to bridge this gap and better scale up robot learning, under the lens of multi-task, multi-scene robot manipulation in kitchen environments. Our framework, named CACTI, has four stages that separately handle data collection, data augmentation, visual representation learning, and imitation policy training. In the CACTI framework, we highlight the benefit of adapting state-of-the-art models for image generation as part of the augmentation stage, and the significant improvement of training efficiency by using pretrained out-of-domain visual representations at the compression stage. Experimentally, we demonstrate that 1) on a real robot setup, CACTI enables efficient training of a single policy capable of 10 manipulation tasks involving kitchen objects, and robust to varying layouts of distractor objects; 2) in a simulated kitchen environment, CACTI trains a single policy on 18 semantic tasks across up to 50 layout variations per task. The simulation task benchmark and augmented datasets in both real and simulated environments will be released to facilitate future research.
translated by 谷歌翻译
Poor sample efficiency continues to be the primary challenge for deployment of deep Reinforcement Learning (RL) algorithms for real-world applications, and in particular for visuo-motor control. Model-based RL has the potential to be highly sample efficient by concurrently learning a world model and using synthetic rollouts for planning and policy improvement. However, in practice, sample-efficient learning with model-based RL is bottlenecked by the exploration challenge. In this work, we find that leveraging just a handful of demonstrations can dramatically improve the sample-efficiency of model-based RL. Simply appending demonstrations to the interaction dataset, however, does not suffice. We identify key ingredients for leveraging demonstrations in model learning -- policy pretraining, targeted exploration, and oversampling of demonstration data -- which forms the three phases of our model-based RL framework. We empirically study three complex visuo-motor control domains and find that our method is 150%-250% more successful in completing sparse reward tasks compared to prior approaches in the low data regime (100K interaction steps, 5 demonstrations). Code and videos are available at: https://nicklashansen.github.io/modemrl
translated by 谷歌翻译
The Multi-Objective Shortest Path Problem, typically posed on a graph, determines a set of paths from a start vertex to a destination vertex while optimizing multiple objectives. In general, there does not exist a single solution path that can simultaneously optimize all the objectives and the problem thus seeks to find a set of so-called Pareto-optimal solutions. To address this problem, several Multi-Objective A* (MOA*) algorithms were recently developed to quickly compute solutions with quality guarantees. However, these MOA* algorithms often suffer from high memory usage, especially when the branching factor (i.e., the number of neighbors of any vertex) of the graph is large. This work thus aims at reducing the high memory consumption of MOA* with little increase in the runtime. In this paper, we first extend the notion of "partial expansion" (PE) from single-objective to multi-objective and then fuse this new PE technique with EMOA*, a recent runtime efficient MOA* algorithm. Furthermore, the resulting algorithm PE-EMOA* can balance between runtime and memory efficiency by tuning a user-defined hyper-parameter.
translated by 谷歌翻译
In this era of pandemic, the future of healthcare industry has never been more exciting. Artificial intelligence and machine learning (AI & ML) present opportunities to develop solutions that cater for very specific needs within the industry. Deep learning in healthcare had become incredibly powerful for supporting clinics and in transforming patient care in general. Deep learning is increasingly being applied for the detection of clinically important features in the images beyond what can be perceived by the naked human eye. Chest X-ray images are one of the most common clinical method for diagnosing a number of diseases such as pneumonia, lung cancer and many other abnormalities like lesions and fractures. Proper diagnosis of a disease from X-ray images is often challenging task for even expert radiologists and there is a growing need for computerized support systems due to the large amount of information encoded in X-Ray images. The goal of this paper is to develop a lightweight solution to detect 14 different chest conditions from an X ray image. Given an X-ray image as input, our classifier outputs a label vector indicating which of 14 disease classes does the image fall into. Along with the image features, we are also going to use non-image features available in the data such as X-ray view type, age, gender etc. The original study conducted Stanford ML Group is our base line. Original study focuses on predicting 5 diseases. Our aim is to improve upon previous work, expand prediction to 14 diseases and provide insight for future chest radiography research.
translated by 谷歌翻译
平行操纵器的配置歧管比串行操纵器表现出更多的非线性。从定性上讲,它们可以看到额外的褶皱。通过将这种歧管投射到工程相关性的空间上,例如输出工作区或输入执行器空间,这些折叠式的边缘呈现出表现非滑动行为的边缘。例如,在五杆链接的全局工作空间边界内显示了几个局部工作空间边界,这些边界仅限于该机制的某些输出模式。当专门研究这些投影而不是配置歧管本身时,这种边界的存在在输入和输出投影中都表现出来。特别是,非对称平行操纵器的设计已被其输入和输出空间中的外来投影所困扰。在本文中,我们用半径图表示配置空间,然后通过使用同型延续来量化传输质量来解决每个边缘。然后,我们采用图路径计划器来近似于避免传输质量区域的配置点之间的大地测量。我们的方法会自动生成能够在非邻居输出模式之间过渡的路径,该运动涉及示波多个工作空间边界(局部,全局或两者)。我们将技术应用于两个非对称五杆示例,这些示例表明如何通过切换输出模式来选择工作空间的传输属性和其他特征。
translated by 谷歌翻译